A client wants you to predict data scientist salaries with machine learning.
June 7, 2018
A client wants you to predict data scientist salaries with machine learning.
Machine learning is a method for teaching computers to make and improve predictions or behaviours based on data.
Kaggle conducted an industry-wide survey of data scientists. https://www.kaggle.com/kaggle/kaggle-survey-2017
Information asked:
Contains information from Kaggle ML and Data Science Survey, 2017, which is made available here under the Open Database License (ODbL).
Client: "There is a problem with the model!"
"What problem?""
Client: "Model predicts high salaries for old yet unskilled people."
Goldstein, A., Kapelner, A., Bleich, J., & Pitkin, E. (2013). Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation, 1–22. https://doi.org/10.1080/10618600.2014.907095
Friedman, J. H. (1999). Greedy Function Approximation : A Gradient Boosting Machine. North, 1(3), 1–10. https://doi.org/10.2307/2699986
Client: "We want to understand the model better!"
Breiman, Leo. "Random forests." Machine learning 45.1 (2001): 5-32.
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. Retrieved from http://arxiv.org/abs/1602.04938
Interpretability is the degree to which a human can understand the cause of a decision. But we don't really have a good way to measure that really. Not as easy as benchmarking ML algorithms.
Interpretability is also a mean to look at the data and possible issues with them.
Miller, Tim. 2017. “Explanation in Artificial Intelligence: Insights from the Social Sciences.” arXiv Preprint arXiv:1706.07269.
We need interpretability when the loss function does not cover all constraints.
When we can capture everything in the loss function and the data collection. only causal relationships. perfect operationalization of features.
Things that work well. Well defined problems
TODO: add images from papers
TODO: Cite
Read my book about "Interpretable Machine Learning" https://christophm.github.io/interpretable-ml-book/